AITopics | true reward true reward

Collaborating Authors

true reward true reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach

Jiang, Haoming, Dai, Bo, Yang, Mengjiao, Zhao, Tuo, Wei, Wei

arXiv.org Artificial IntelligenceFeb-28-2021

Reliable automatic evaluation of dialogue systems under an interactive environment has long been overdue. An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments. Though researchers have attempted to use metrics (e.g., perplexity, BLEU) in language generation tasks or some model-based reinforcement learning methods (e.g., self-play evaluation) for automatic evaluation, these methods only show a very weak correlation with the actual human evaluation in practice. To bridge such a gap, we propose a new framework named ENIGMA for estimating human evaluation scores based on recent advances of off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation, making automatic evaluations feasible. More importantly, ENIGMA is model-free and agnostic to the behavior policies for collecting the experience data (see details in Section 2), which significantly alleviates the technical difficulties of modeling complex dialogue environments and human behaviors. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.

evaluation, experience data, true reward true reward, (11 more...)

arXiv.org Artificial Intelligence

2102.10242

Country:

North America > United States > Pennsylvania (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy-Aware Model Learning for Policy Gradient Methods

Abachi, Romina, Ghavamzadeh, Mohammad, Farahmand, Amir-massoud

arXiv.org Artificial IntelligenceFeb-28-2020

A model-based reinforcement learning (MBRL) agent gradually learns a model of the environment as it interacts with it, and uses the learned model to plan and find a good policy. This can be done by planning with samples coming from the model, instead of or in addition to the samples from the environment, e.g., Sutton (1990); Peng & Williams (1993); Sutton et al. (2008); Deisenroth et al. (2015); Talvitie (2017); Ha & Schmidhuber (2018). If learning a model is easier than learning the policy or value function in a model-free manner, MBRL will lead to a reduction in the number of required interactions with the real-world and will improve the sample complexity of the agent. However, this is contingent on the ability of the agent to learn an accurate model of the real environment. Therefore, the problem of learning a good model of the environment is of paramount importance in the success of MBRL. This paper addresses the question of how we can approach the problem of learning a model of the environment, and proposes a method called policy-aware model learning (PAML). The conventional approach to model learning in MBRL is to learn a model that is a good predictor of the environment. If the learned model is accurate enough, this leads to a value function or a policy that is close to the optimal one. Learning a good predictive model can be achieved by minimizing some form of a probabilistic loss.

dimension, paml, value function, (14 more...)

arXiv.org Artificial Intelligence

2003.0003

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback